Add function to compute statistics about a scheduled product #6589

Martchus · 2025-07-15T16:02:38Z

This is to be able to add hook script support for scheduled products because for this we need to be able to determine whether all jobs of a scheduled product are done (to execute the hook script in this case).

The hook script is supposed to do automatic approval/disapproval of changes. Hence it needs not only to know whether all jobs are done but also the state/result they ended up with. So this function returns these kinds of statistics instead of a binary "all done". Maybe it will need to be changed to return more high level statistics (like "all passed"), though.

Related ticket: https://progress.opensuse.org/issues/184690

Just a draft because it still needs tests and it also makes no sense to merge this without being used. However, I tested this on OSD with production data and scheduled products that have many jobs (up to 260) and a few restart chains (of depth up to 6) and it was very fast.

That's good because it means the hardest part of this feature is solved. Now I just need to use this in the Minion job where we already execute job done hooks to check whether we can execute a scheduled product hook and execute that hook in the same way we execute job done hooks. I would read the script name/path from the scheduled product settings. I would pass the scheduled product ID as first argument and the statistics as JSON as second argument. (As mentioned in the commit message statistics could be a bit more high-level than what I currently have.)

Then I can write the hook script. I think this could be done in Python to be able to use the osc Python libraries directly - just like qem-bot does. Or a simple shell script that invokes the osc CLI tool.

So I guess that would be the plan for this feature and it is maybe useful in the future for more than just the increment approval.

codecov · 2025-07-15T16:22:48Z

Codecov Report

Attention: Patch coverage is 16.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 99.10%. Comparing base (fe96ee5) to head (76b7951).
Report is 24 commits behind head on master.

Files with missing lines	Patch %	Lines
lib/OpenQA/Schema/Result/ScheduledProducts.pm	16.66%	5 Missing ⚠️

❌ Your patch status has failed because the patch coverage (16.66%) is below the target coverage (100.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6589      +/-   ##
==========================================
- Coverage   99.11%   99.10%   -0.02%     
==========================================
  Files         399      399              
  Lines       40717    40723       +6     
==========================================
+ Hits        40358    40359       +1     
- Misses        359      364       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

okurz · 2025-07-16T04:59:32Z

So I guess that would be the plan for this feature and it is maybe useful in the future for more than just the increment approval.

Have you considered to provide the information from this function in an amqp message so that external tooling can react on such "build is done"?

Martchus · 2025-07-16T08:50:22Z

I think emitting an amqp message would make this more challenging in two ways:

We still need an alternative approach to cover the case of a missed amqp message. So it would be work in addition to some other approach like an API route someone can call periodically.
I assume you are talking about a generic event - so openQA would emit such an amqp message for every scheduled product. Then I'm not sure whether this scales well enough. The query is fast but we need to do this query for each job that has finished in every scheduled product to determine whether the whole scheduled product is finished now. It is probably one thing to do this for selected scheduled products but another thing to do it for all. So if a scheduled product has e.g. 200 jobs we would need to check whether the scheduled product is done 200 times involving a non-trivial query with recursion that checks all the 200 jobs. That makes the database go through 40000 jobs for this single scheduled product. On OSD we have lots of scheduled products and some are even bigger than 200 jobs. I also haven't even been taking other overhead into account. So I am not confident that this quadratic-scaling algorithm will be good enough for the generic case. (We could maybe optimize this with an early return so the number of checks would be halved on average but this would further complicate things and not change the quadratic nature of this.)
I'm not sure whether we have resolved the issue of allowing amqp traffic on GitLab or Open Platform.

We should probably only solve the problem at hand for now and in the simplest way. Point 1 let me think of an even simpler approach than a hook script: Instead of running a hook script on the OSD host we could also just provide an API endpoint that returns these statistics. Then some bot can query this route once per hour (probably once every 6 hours would be enough) for relevant scheduled products if there is a pending "increment" to be approved.

This would be efficient and easy to implement because:

If there is no increment to be approved there is no additional overhead at all within openQA. Only the bot has to check whether there's an increment and return early if not.
The query would only be done for scheduled products we are interested in. The same counts for my hook script approach of course (but not the next point).
The bot could directly focus on the most recent relevant scheduled products. So in case a product is scheduled again (e.g. after amending settings) we would not deal with the old scheduled product anymore at all.
The query wouldn't need to be done after each individual job is done but only once in a certain time interval. Hence no bad quadratic scaling.
We don't need to check after each and every job is done whether that job (or the original job) was scheduled via a certain scheduled product. Although to be fair, we already do this anyway for the webhook-based CI integration and it is just one additional query per job.
The bot could just be another sub command of qem-bot. This way we could use the existing repo/helpers from qem-bot (but still wouldn't need to care about its existing logic as we would just add a new sub-command). We could also re-use its execution environment where everything is already in place to access openQA and IBS. I think especially the last point the reduce the effort quite a lot.
We wouldn't need to introduce an additional variable to specify a hook script in scheduled product settings. (Although this wouldn't be a big deal.)
The script wouldn't need to run on the OSD host but on GitLab so we avoid overloading OSD with more and more things.

This approach still has one challenge, though. To find the most recent relevant scheduled product for each arch we need a query like this:

select max(id) as id, arch from scheduled_products where distri = 'sle' and version = '15.99' and flavor = 'Online-Increments' group by arch;

This is currently not very efficient because we don't have an index on the scheduled products table but it has almost 3 million rows. (The query seems to be still fast enough for now and we only have to do it e.g. once per hour.)

I think actually all approaches have this challenge because we always have to determine whether the scheduled product we are currently dealing with (e.g. in its hook script or when receiving an amqp event about it) is still the most recent. With the API approach we would at least never have to invoke this query just to find out that the scheduled product we are currently dealing with is not relevant anymore. (And by the way, just when I'm writing about this I find that this is really not a fictional aspect to care about as Richard has just re-triggered the scheduled product, see https://suse.slack.com/archives/C08DC2SHABV/p1752637433485059?thread_ts=1750847709.352269&cid=C08DC2SHABV.)

Note that if we have an API as I mentioned we can still think of emitting an amqp message in addition and for specific products. (In addition as per point 1 and only for specific scheduled products to avoid point 2.)

This is to be able to add hook script support for scheduled products because for this we need to be able to determine whether all jobs of a scheduled product are done (to execute the hook script in this case). The hook script is supposed to do automatic approval/disapproval of changes. Hence it needs not only to know whether all jobs are done but also the state/result they ended up with. So this function returns these kinds of statistics instead of a binary "all done". Maybe it will need to be changed to return more high level statistics (like "all passed"), though. Related ticket: https://progress.opensuse.org/issues/184690

Martchus · 2025-07-21T14:33:11Z

Closing in favor of #6592.

Martchus force-pushed the scheduled-product-hook branch from 7f78922 to d63ea02 Compare July 15, 2025 16:04

Martchus force-pushed the scheduled-product-hook branch from d63ea02 to 76b7951 Compare July 16, 2025 11:33

Martchus mentioned this pull request Jul 16, 2025

Allow querying the state of scheduled products by distri/version/flavor #6592

Merged

Martchus closed this Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to compute statistics about a scheduled product #6589

Add function to compute statistics about a scheduled product #6589

Uh oh!

Martchus commented Jul 15, 2025

Uh oh!

codecov bot commented Jul 15, 2025 •

edited

Loading

Uh oh!

okurz commented Jul 16, 2025

Uh oh!

Martchus commented Jul 16, 2025

Uh oh!

Martchus commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add function to compute statistics about a scheduled product #6589

Add function to compute statistics about a scheduled product #6589

Uh oh!

Conversation

Martchus commented Jul 15, 2025

Uh oh!

codecov bot commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

okurz commented Jul 16, 2025

Uh oh!

Martchus commented Jul 16, 2025

Uh oh!

Martchus commented Jul 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jul 15, 2025 •

edited

Loading